Implement Kalman model using FastAI

need to implement custom data preparation pipeline and loss function

reset_seed()

hai_path

Path('/home/simone/Documents/uni/Thesis/GPFA_imputation/data/FLX_DE-Hai_FLUXNET2015_FULLSET_HH_2000-2012_1-4_float32.parquet')

# hai = pd.read_parquet(hai_path)
hai = pd.read_parquet(hai_path64)
hai_era = pd.read_parquet(hai_era_path64)
# hai_era64 = pd.read_parquet(hai_era_path64)

Data Preparation

The aim of the data preparation pipeline is to: - take the original time series and split it into time blocks - for each block generate a random gap (need to figure out the properties of the gap) - split some time blocks for testing

the input of the pipeline is: - a dataframe containing all observations

the input of the model is: - observed data (potentially containing NaN where data is missing) - missing data mask (which is telling where the data is missing) - the data needs to be standardized

	TA_lag_1	SW_IN_lag_1	VPD_lag_1
time
2000-01-01 12:30:00	0.33	18.86	0.008
2000-01-01 13:00:00	0.41	21.10	0.006
2000-01-01 13:30:00	0.44	28.87	0.000
2000-01-01 14:00:00	0.48	24.22	0.000
2000-01-01 14:30:00	0.49	24.35	0.000
2000-01-01 15:00:00	0.51	15.68	0.000
2000-01-01 15:30:00	0.52	8.09	0.000
2000-01-01 16:00:00	0.57	6.37	0.000
2000-01-01 16:30:00	0.73	1.72	0.000
2000-01-01 17:00:00	0.77	0.06	0.000

	TA_lag_1	SW_IN_lag_1	VPD_lag_1
time
2000-01-01 12:30:00	NaN	NaN	NaN
2000-01-01 13:00:00	0.33	18.86	0.008
2000-01-01 13:30:00	0.41	21.10	0.006
2000-01-01 14:00:00	0.44	28.87	0.000
2000-01-01 14:30:00	0.48	24.22	0.000
2000-01-01 15:00:00	0.49	24.35	0.000
2000-01-01 15:30:00	0.51	15.68	0.000
2000-01-01 16:00:00	0.52	8.09	0.000
2000-01-01 16:30:00	0.57	6.37	0.000
2000-01-01 17:00:00	0.73	1.72	0.000

	TA	SW_IN	VPD	TA_lag_1	SW_IN_lag_1	VPD_lag_1	TA_lag_2	SW_IN_lag_2	VPD_lag_2
time
2000-01-01 12:30:00	0.33	18.86	0.008	NaN	NaN	NaN	NaN	NaN	NaN
2000-01-01 13:00:00	0.41	21.10	0.006	0.33	18.86	0.008	NaN	NaN	NaN
2000-01-01 13:30:00	0.44	28.87	0.000	0.41	21.10	0.006	0.33	18.86	0.008
2000-01-01 14:00:00	0.48	24.22	0.000	0.44	28.87	0.000	0.41	21.10	0.006
2000-01-01 14:30:00	0.49	24.35	0.000	0.48	24.22	0.000	0.44	28.87	0.000
2000-01-01 15:00:00	0.51	15.68	0.000	0.49	24.35	0.000	0.48	24.22	0.000
2000-01-01 15:30:00	0.52	8.09	0.000	0.51	15.68	0.000	0.49	24.35	0.000
2000-01-01 16:00:00	0.57	6.37	0.000	0.52	8.09	0.000	0.51	15.68	0.000
2000-01-01 16:30:00	0.73	1.72	0.000	0.57	6.37	0.000	0.52	8.09	0.000
2000-01-01 17:00:00	0.77	0.06	0.000	0.73	1.72	0.000	0.57	6.37	0.000

	TA	SW_IN	VPD
time
2000-01-01 12:30:00	0.3300	18.8600	0.0080
2000-01-01 13:00:00	0.4100	21.1000	0.0060
2000-01-01 13:30:00	0.4400	28.8700	0.0000
2000-01-01 14:00:00	0.4800	24.2200	0.0000
2000-01-01 14:30:00	0.4900	24.3500	0.0000
2000-01-01 15:00:00	0.5100	15.6800	0.0000
2000-01-01 15:30:00	0.5200	8.0900	0.0000
2000-01-01 16:00:00	0.5700	6.3700	0.0000
2000-01-01 16:30:00	0.7300	1.7200	0.0000
2000-01-01 17:00:00	0.7700	0.0600	0.0000

Data Preparation

Utils

Item

MeteoImpItem

1) Block Index

MeteoImpIndex

BlockIndexTransform

2) Meteo Imp Block DataFrames

DataControl

Add lag

BlockDfTransform

3) Gaps

Make random Gap

Add Gap Transform

MeteoImpDf

AddGapTransform

Tidy

MeteoImpDf.tidy

Plotting

Rug

Missing Area

find_gap_limits

plot_missing_area

Points

plot_points

Line

Control

Errorband

Variable

plot_variable

Facet

facet_variable

Show

MeteoImpDf.show

4) To Tensor

MeteoImpTensor

MeteoImpDf2Tensor

5) Normalize

get_stats

MeteoImpNormalize

6) To Tuple

ToTuple

Pipeline

Generators

as_generator

Gap Len Generator

gen_gap_len

gen_gap_len_gamma

Var Sel Generator

gen_var_sel

Shifts generator

gen_shifts

Block Ids

get_block_ids

Pipeline

imp_pipeline

Dataloader

imp_dataloader

Model

Data type

MNormalsParams

NormalsParams

Forward Function

forward’]

Loss Function

get_only_gap

KalmanLoss

Only Gap

Metrics

ImpMetric

imp_rmse

Callback

SaveParams

SaveParams

Learner

Float64

Float64Callback

Only gap

Predictions

Predictions from custom items